26 research outputs found

    Morphology-Syntax interface for Turkish LFG

    Get PDF
    This paper investigates the use of sublexical units as a solution to handling the complex morphology with productive derivational processes, in the development of a lexical functional grammar for Turkish. Such sublexical units make it possible to expose the internal structure of words with multiple derivations to the grammar rules in a uniform manner. This in turn leads to more succinct and manageable rules. Further, the semantics of the derivations can also be systematically reflected in a compositional way by constructing PRED values on the fly. We illustrate how we use sublexical units for handling simple productive derivational morphology and more interesting cases such as causativization, etc., which change verb valency. Our priority is to handle several linguistic phenomena in order to observe the effects of our approach on both the c-structure and the f-structure representation, and grammar writing, leaving the coverage and evaluation issues aside for the moment

    Building a lexical functional grammar for Turkish

    Get PDF
    Large-scale, deep grammars with structurally rich output are basic resources for complex tools in human-computer interaction and also for exploring the linguistic phenomena of a language. In this thesis, we introduce a large scale grammar for Turkish implemented in the Lexical Functional Grammar formalism. Developing a large scale grammar requires that several issues be solved, both linguistically and computationally. As the language to be dealt with is Turkish, rich morphological structures play an important role in constructing the basis of the representation. We follow an approach based on building units that are larger than a morpheme but smaller than a word, in encoding rules of the grammar to explain the linguistic phenomena in a more formal and accurate way. Our implementation covers rules ranging from basic constituents such as adjective, adverbial, or prepositional phrases to more complex types with derivations such as sentential complements, sentential adjuncts, and relative clauses. The noun phrase subgrammar is the core of the system. Other important rules deal with several types of sentence structures, free word order, and coordination. Also, a date-time grammar developed earlier is integrated into our system. Some of the frequently occuring phenomena, such as causatives, passives, noun-verb compounds, and non-canonical objects, are also important from a theoretical perspective. We first examine their linguistic representation and then analyze the details of different types of causatives and non-canonical objects by conducting several tests. We then provide their implementation. To evaluate our grammar we have experimented with real world data. Results show that we have a reasonably high coverage in noun phrases (85.5%). We have also integrated our system into a tool called LingBrowser

    Building a wordnet for Turkish

    Get PDF
    This paper summarizes the development process of a wordnet for Turkish as part of the Balkanet project. After discussing the basic method-ological issues that had to be resolved during the course of the project, the paper presents the basic steps of the construction process in chronological order. Two applications using Turkish wordnet are summarized and links to resources for wordnet builders are provided at the end of the paper

    Altsözcüksel birimlerle Türkçe için sözcüksel işlevsel gramer geliştirilmesi

    Get PDF
    Bu bildiri Türkçe’nin karmaşık biçimbilimsel yapısı ve zengin türetme olaylarını ele alırken bir çözüm olarak altsözcüksel birimler kullanmayı incelemekte ve önerilen yaklaşımı Pargram projesi dahilinde gerçeklenmekte olan Türkçe sözcüksel işlevsel gramer üzerinden anlatmaktadır. İzlediğimiz yaklaşım sayesinde kurallar daha düzenli ve özlü bir şekilde yazılabilmekte, böylece hem genelleme imkanı arttığı için daha az sayıda olan hem de içerik olarak karmaşık olmayan kuralarla gramer kapsamı genişletilebilmektedir. Üstelik türetmelerin sözcüklere anlambilimsel katkıları programın çalışması sırasında yaratılan PRED değerleri sayesinde sistematik bir biçimde ifade edilebilmektedir. Çalışmamız altsözcüksel birimlerin basit yapım ekleri ile kullanımına yer vermekte daha sonra ettirgen yapılar gibi görece daha karmaşık dil olaylarına değinmektedir. Öncelikli amacımız kullandığımız yaklaşımı mümkün olduğunca birbirinden farklı dilbilimsel alanlarda incelemek olduğu için bu bildiride sayısal bir değerlendirmeye yer verilmemiştir

    Lexical Normalization for Code-switched Data and its Effect on POS Tagging

    Get PDF
    Lexical normalization, the translation of non-canonical data to standard language, has shown to improve the performance of manynatural language processing tasks on social media. Yet, using multiple languages in one utterance, also called code-switching (CS), is frequently overlooked by these normalization systems, despite its common use in social media. In this paper, we propose three normalization models specifically designed to handle code-switched data which we evaluate for two language pairs: Indonesian-English (Id-En) and Turkish-German (Tr-De). For the latter, we introduce novel normalization layers and their corresponding language ID and POS tags for the dataset, and evaluate the downstream effect of normalization on POS tagging. Results show that our CS-tailored normalization models outperform Id-En state of the art and Tr-De monolingual models, and lead to 5.4% relative performance increase for POS tagging as compared to unnormalized input

    Treebanking user-generated content: A proposal for a unified representation in universal dependencies

    Get PDF
    The paper presents a discussion on the main linguistic phenomena of user-generated texts found in web and social media, and proposes a set of annotation guidelines for their treatment within the Universal Dependencies (UD) framework. Given on the one hand the increasing number of treebanks featuring user-generated content, and its somewhat inconsistent treatment in these resources on the other, the aim of this paper is twofold: (1) to provide a short, though comprehensive, overview of such treebanks - based on available literature - along with their main features and a comparative analysis of their annotation criteria, and (2) to propose a set of tentative UD-based annotation guidelines, to promote consistent treatment of the particular phenomena found in these types of texts. The main goal of this paper is to provide a common framework for those teams interested in developing similar resources in UD, thus enabling cross-linguistic consistency, which is a principle that has always been in the spirit of UD

    CoNLL-UL: Universal Morphological Lattices for Universal Dependency Parsing

    Get PDF
    International audienceFollowing the development of the universal dependencies (UD) framework and the CoNLL 2017 Shared Task on end-to-end UD parsing, we address the need for a universal representation of morphological analysis which on the one hand can capture a range of different alternative morphological analyses of surface tokens, and on the other hand is compatible with the segmentation and morphological annotation guidelines prescribed for UD treebanks. We propose the CoNLL universal lattices (CoNLL-UL) format, a new annotation format for word lattices that represent morphological analyses, and provide resources that obey this format for a range of typologically different languages. The resources we provide are harmonized with the two-level representation and morphological annotation in their respective UD v2 treebanks, thus enabling research on universal models for morphological and syntactic parsing , in both pipeline and joint settings, and presenting new opportunities in the development of UD resources for low-resource languages
    corecore